53997da5d vs 03a1b695e - NVFuser codegen diff

53997da5d
53997da5d Use set allocation domains if used by TMA or MmaOp (#4234) [browse]
Naoya Maruyama <naoyam@users.noreply.github.com>
Thu Apr 10 18:40:23 2025 -0700

03a1b695e
03a1b695e temp [browse]
Naoya Maruyama <nmaruyama@nvidia.com>
Fri Apr 11 08:59:59 2025 -0700

Command: build/test_view
GPUs:
['NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n']
matches between runs
matches between runs
matches between runs

Test Diffs

1: GpuViewTest.FusionReshapePersistentShmoo
  Kernel 32    -4 +4index type: int registers: 18 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 36    -8 +8index type: int registers: 14 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 42    -8 +8index type: int registers: 14 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 71    -8 +8index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 79    -8 +8index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 90    -4 +4index type: int registers: 18 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 133    -9 +9index type: int registers: 15 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 140    -9 +9index type: int registers: 15 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 145    -2 +2index type: int registers: 39→ 23 gmem: 3 static smem: 4→ 16 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 149    -2 +2index type: int registers: 39→ 23 gmem: 3 static smem: 4→ 16 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 171    -10 +10index type: int registers: 24 gmem: 3 static smem: 4 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 178    -9 +9index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 187    -10 +10index type: int registers: 24 gmem: 3 static smem: 4 stack frame: 0 spill stores: 0 spill loads: 0

2: GpuViewTest.FusionReshapeSplit
  Kernel 1    -2 +2index type: int registers: 14 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 2    -4 +4index type: int registers: 14 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

3: GpuViewTest.FusionReshapeBroadcast
  Kernel 2    -4 +4index type: int registers: 14 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

4: GpuViewTest.FusionReshapeAllShmoo
  Kernel 20    -4 +4index type: int registers: 18 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 60    -4 +4index type: int registers: 18 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

5: GpuViewTest.FusionReshapeStride
  Kernel 19    -1 +1index type: int registers: 32→ 30 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 20    -4 +4index type: int registers: 18 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 21    -1 +1index type: int registers: 28 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 25    -1 +1index type: int registers: 28 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 39    -1 +1index type: int registers: 23 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 41    -1 +1index type: int registers: 23 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 59    -1 +1index type: int registers: 32→ 30 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 60    -4 +4index type: int registers: 18 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

6: GpuViewTest.FusionReshapeConcreteDomain3
  Kernel 1    -2 +2index type: int registers: 32 gmem: 3 static smem: 4 stack frame: 0 spill stores: 0 spill loads: 0

7: GpuViewTest.FusionReshapeMagicSchedule8
  Kernel 1    -19 +19index type: int registers: 80→ 72 gmem: 27 static smem: 16 stack frame: 32 spill stores: 0 spill loads: 0

8: GpuViewTest.FusionIssue2076
  Kernel 1    -1 +1index type: int registers: 30 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

9: GpuViewTest.GroupNormOriginal
  Kernel 2    -1 +1index type: int registers: 40 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

10: GpuViewTest.GroupNormReshapeMovedToOutput
  Kernel 1    -4 +4index type: int registers: 38→ 35 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

11: GpuViewTest.FusionMismatchingReshape
  Kernel 1    -4 +4index type: int registers: 18 gmem: 27 static smem: 0 stack frame: 32 spill stores: 0 spill loads: 0

12: GpuViewTest.ReplacedScalarInSplitOutput
  Kernel 2    -1 +1index type: int registers: 10 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

13: ReshapeReduction.FusionReshapeReduction/42
  Kernel 1    -4 +4index type: int registers: 16 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

14: ReshapeReduction.FusionReshapeReduction/43
  Kernel 1    -4 +4index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

15: ReshapeReduction.FusionReshapeReduction/45
  Kernel 1    -4 +4index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

16: ReshapeReduction.FusionReshapeReduction/56
  Kernel 1    -4 +4index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

17: ReshapeReduction.FusionReshapeReduction/58
  Kernel 1    -4 +4index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

18: ReshapeReduction.FusionReshapeReduction/62
  Kernel 1    -4 +4index type: int registers: 16 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

19: ReshapeReduction.FusionReshapeReduction/68
  Kernel 1    -1 +1index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

20: ReshapeReduction.FusionReshapeReduction/84
  Kernel 2    -4 +4index type: int registers: 16 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

21: ReshapeReduction.FusionReshapeReduction/88
  Kernel 1    -21 +18index type: int registers: 0 gmem: 0 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 2    -18 +21index type: int registers: 15→ 14 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0